Identification and genetic diversity analysis of hybrid offspring of azalea based on EST-SSR markers

Azalea is a world famous flower with high economic and ornamental value. The breeding of new azalea varieties is mainly done by cross breeding. However, there is a risk that cross breeding may cause errors in the hybrid offspring due to contamination by exogenous pollen. Therefore, the identification of hybrid offspring is an important part of azalea breeding. In this study, the parents of three hybrid combinations and their 88 F1 hybrid offspring were selected to screen 15 pairs of EST-SSR primers to identify the authenticity of azalea hybrid offspring. The results showed that the authenticity of 88 azalea F1 hybrid progenies could be determined by at least four primer pairs. Genetic diversity analysis of azalea hybrid progeny revealed that the number of alleles and polymorphic information content of the progeny increased to different degrees, and the more distant the genetic distance between parents, the richer the polymorphic information. It is suggested that EST-SSR molecular marker can be applied for the early identification and genetic diversity analysis of the progeny of azalea hybrids. This method is of positive significance for improving the breeding efficiency of new varieties and exploring the genetic background of azalea.


Identification and genetic diversity analysis of hybrid offspring of azalea based on EST-SSR markers
Ming Su 1 , Chunying Zhang 1* & Shucheng Feng 2 Azalea is a world famous flower with high economic and ornamental value. The breeding of new azalea varieties is mainly done by cross breeding. However, there is a risk that cross breeding may cause errors in the hybrid offspring due to contamination by exogenous pollen. Therefore, the identification of hybrid offspring is an important part of azalea breeding. In this study, the parents of three hybrid combinations and their 88 F1 hybrid offspring were selected to screen 15 pairs of EST-SSR primers to identify the authenticity of azalea hybrid offspring. The results showed that the authenticity of 88 azalea F1 hybrid progenies could be determined by at least four primer pairs. Genetic diversity analysis of azalea hybrid progeny revealed that the number of alleles and polymorphic information content of the progeny increased to different degrees, and the more distant the genetic distance between parents, the richer the polymorphic information. It is suggested that EST-SSR molecular marker can be applied for the early identification and genetic diversity analysis of the progeny of azalea hybrids. This method is of positive significance for improving the breeding efficiency of new varieties and exploring the genetic background of azalea.
Azaleas is an important group of the Rhododendron in the family Ericaceae, and famous flowers in the world 1 . It has important economic and ornamental values and occupies an important position in urban greening and landscaping 2 . At present, there are more than 1,000 species of Rhododendron and more than 30,000 varieties in the world [3][4][5] . Under natural conditions, the sexual reproduction of Rhododendron depends on visits from pollinators 6 . And Rhododendron varieties are bred mainly through various ways such as cross breeding, bud breeding and polyploid breeding, among which, cross breeding produces most of the new azalea varieties 7 . Azalea is a cross-pollinated woody plant with a long juvenile period, ranging from 3-10 years from sowing to flowering 8 , and it is susceptible to contamination by foreign pollen during thecross-breeding process, resulting in a mixture of true and false offspring. If these false hybrids cannot be screened and removed in time, it will certainly cause a lot of waste of land, manpower and material resources and reduce the reproductive efficiency. Therefore, the early identification of the authenticity of hybrids is a very important link in the breeding process of Azaleas.
With the continuous development of molecular technology, molecular markers have gradually appeared in the identification of hybrid progeny due to their rapid and accurate advantages. In particular, the third-generation molecular markers EST-SSR (Expressed Sequences Tags-Simple Sequence Repeats) is considered to be a relatively ideal method. Because of the advantages of codominant inheritance, good reproducibility and versatility, it is used for the identification of hybrids 9,10 . It is considered a more ideal method to identify hybrids 11 . At present, the EST-SSR molecular identification of hybrid progeny of woody plants such as Litchi chinensis Sonn. 12 and Camellia sinensis 13 has been well applied. However, current research on EST-SSR markers in Rhododendron is primarily focused on the development of molecular markers and genetic diversity analysis, and the use of EST-SSR molecular markers to identify hybrid progeny has not been reported [14][15][16][17][18] . On the one hand,There are just over two hundred est-ssr primers for different taxa with high polymorphism of Rhododendron, which is not enough to support more applications 14,[16][17][18] . On the other hand, The polyacrylamide gel electrophoresis (PAGE) silver staining detection technique is mainly used to estimate the results by visual comparison of the size of DNA fragments with known molecular weight standards, and the accuracy and efficiency of this method limit the application of rhododendron EST-SSR molecular markers [19][20][21] . In this study, when faced with the identification of

Results
Primer design and selection. A total of 2585 sequences were downloaded from the NCBI nucleic acid database for all azaleas ESTs and 436 redundant sequences were removed from the downloaded sequences using CD-HIT-ESTto obtain 2149 sequences non-redundant, MISA software was used to perform SSR loci screening 22 . A total of 233 eligible SSR-containing sequences were obtained and 258 SSR sites were identified, for 25 sequences contained more than one SSR locus, 13 of which were composite sequences. The total length of the sequence is 937.496 kb, averaging 3.63 kb base pairs, and has 1 EST-SSR site. The distribution of the different types of repeat is shown in Table 1. The analysis revealed that the highest number of repeat types are single nucleotides and dinucleotides, which represent 39.15% and 34.88%, respectively, followed by trinucleotide repeats, representing 22.09%, and some tetranucleotides (0.78%), pentanucleotide (2.33%) and hexanucleotide (0.78%). Among the different repetitive nucleic acid sequences, A/T (101) and AG/CT (85) are the repetitive motifs with the highest number of repeats, much higher than other repetitivemotifs.
Statistics of the length of the EST-SSR and the distribution of the repetition frequency of the Azalea motif found that the Mononucleotide motif and the Dinucleotide motif were more repetitive.The number of repeats for Trinucleotide motifs is less than 12, for Tetranucleotide motifs and Pentanucleotide motifs are less than 7, and for hexanucleotide motifs there are only 2 repeats of 4 and 5, see Table 2.
Using the primer-blast for primer design, 18 SSR sites were finally selected. Among them, 2 single nucleotide repeats, 6 dinucleotide repeats, 3 trinucleotide repeats, 1 pentanucleotide repeat, 1 hexanucleotide repeat, and 5 complex repeats were designed. Three pairs of primers were designed for each SSR site, and a total of 54 pairs of primers were obtained. Then the invalid primers were removed by NCBI database comparison, and finally 19 pairs of primers were obtained.Together with 30 pairs of Azalea EST-SSR primers already published in the literature, a total of 49 pairs of primers were used to amplify DNA for all Azalea samples to be tested. Six types of bands appeared in the maternal paternal amplified bands: bbaa, bcaa, aaab, acab, ccab, and cdab. Analysis of the band pattern of 88 hybrid offspring revealed that the band pattern of the offspring could be divided into four types: heterozygous, paternal, maternal, and other types. Heterozygous type refers to offspring that contain specific bands from both parents, as shown in Fig. 1. The paternal type refers to the offspring that contain the father's specific bands but not the mother's specific bands as shown in Fig. 2. Maternal type means that the offspring contains parent specific bands, but not subspecies-specific bands; other types are relatively complex and some offspring have produced new bands.Among the above band types, only the parental type and heterozygous www.nature.com/scientificreports/ type could be used to identify the authenticity of the offspring, so 15 pairs of primers were selected from the 49 pairs of azaleas EST-SSR primers that had been screened as primers for authenticating the offspring of azaleas crosses., as shown in Table 3. Using these primers, the experimental results of 15 pairs of primers in 6 parents and 88 offspring can be obtained. 15 pairs of EST-SSR primers were used to amplify the DNA of 94 samples, and the result analysis is shown in Table 4. The percentage of polymorphics sites of 15 pairs of primers is 100%, total alleles is 95 , Na is 6.3333 and Ne is 3.9622. There is a large gap between the number of alleles obtained and the number of alleles observed and the number of effective alleles, indicating that the alleles are unevenly distributed in the population, PIC is from 0.8925 (S14) to 0.2400 (S9), the average value is 0.5767. Among them, the PIC for R13, R14, R15, R26, R28, S1, S10, S14 and S19 are > 0.5000 . The primers are highly polymorphic and can replace the core primers for progeny identification.
Authentication of azalea hybrid offspring. The amplified bands of all the test samples were compared by fluorescence capillary electrophoresis. Based on the test results, it can be stated that all the test samples are true descendants. During the experiment, it was found that some primers performed well in some hybrid combinations, but did not work well in some combinations, indicating that the versatility of these primers was not sufficient, see Table 5. However, some primers have strong discrimination ability, such as primers S14, S1, R29, R28 and R26, which can distinguish more than half of the fertilized eggs in the three hybrid combinations and have good discrimination ability. Therefore, these 5 primer pairs are recommended as core primers for the identification of Azalea offspring. They can be used as center primers to complete the identification of the offspring based on the multiple marker combination method in detecting the authenticity of the hybrid offspring, as shown in Table 6. R.pulcherum 'Zihe' R.'Red Apple' can use at least two primer pairs for offspring identification, and R. 'Zibo' R. 'Kirin' and R.pulcherum'Baihe' R.'Pink Bubble' can be used for offspring identification. The offspring of these three crosses are true hybrids and can be identified by at least two pairs of primers. The cluster analysis of the three crosses shows that the combination of R.pulcherum'Zihe' R.'Red Apple' can be divided into three categories, with a similarity coefficient of 0.66, as shown in Fig. 3a. The first class had eight offspring, and clustered with their parents, representing 26.7% of the total. The first category had 8 offspring and clustered with the parent, which represents 26.7%, the second category had 21 offspring and clustered with the parent, accounting for 70%, and the third category was the offspring A8, a separate category. It can be seen that the offspring of this hybrid are more influenced by the parents, and the offspring appear to be paternally biased. It is noteworthy that both A8 and the parent are not clustered into one category, presumably more genetic mutations or recombination occurred during the cross. The R.pulcherum'Baihe' R.'Pink Bubble' combination can be divided into six groups at a similarity of 0.64, shown in Fig. 3c. Eight descendants form a group with the female parent, accounting for 29.6%, 6 descendants form a group with the male parent, accounting for 22.2% and the remaining 13.The genetic distance between the remaining 13 offspring and the parents is relatively large, accounting for 48.1%, indicating that a large number of genetic mutations or recombination's occurred within this cross, resulting in more super-parental offspring. The combination R.'Zibo' R.'Kirin' can be divided into 3 categories, with a genetic coefficient of 0.68, shown in Fig. 3b. Twenty-one offspring were grouped with their parents, accounting for 67.7%, and the mother was groupedclustered with herself, and 10 offspring were     Table 6. Identification of Azalea subgeneration combined marker.

Discussion
Design and screening of 15 pairs of Azalea EST-SSR primers with potential for hybrid offspring identification. The experimental results revealed that EST-SSR primers with fragment repeat types of dinucleotide and trinucleotide are more advantageous in terms of versatility and polymorphism, which is consistent with the results of Mei-Qin Li 18 , so that the primers design process can preferentially select the type of www.nature.com/scientificreports/ dinucleotide repeat and EST-SSR trinucleotides. In the primer selection process, it was found that the 6 types of bands (bbaa, bcaa, aaab, acab, ccab, cdab) of the maternal paternal primers were recognized by potential of hybrid offspring, but the recognition efficiency of each type of band is different.. In this study, primers S1 and S14 were the cd ab band type,with the highest identification rate, 95.45% and 97.73%, respectively, indicating that the identification efficiency of the cd ab band type is the highest, and can be used as the first choice primer for the identification of hybrid progeny which is consistent with the results of Lei Yu et al 13 . In the process of offspring identification, using the least number of primers to identify more species in the direction of primer optimization 23 , which requires primers with good polymorphism and versatility. EST-SSR exists in the coding region of gene transcription, so the sequence is more conserved and universal. Although it is less polymorphic than Genomic-SSR primers, some studies have shown that there is no significant difference between the two in the evaluation of genetic diversity 24 . The Na, I, PIC of the 15 primer pairs investigated in this experiment were 6.3333, 1.2977, and 0.5767, respectively, showing aboundant polymorphisms. In particular, the five pairs of EST sequences of S14, S1, R29, R28, and R26 are generated Cdna clones randomly selected from different tissues and organs or at different developmental stages. If an EST-SSR is associated with a trait during the selection process, mark it may be related to the gene controlling this trait [25][26][27] . For example, the EST-SSR involved in primer S1 comes from a cDNA library of relatively cold acclimated and non-cold acclimated Azaleas 28 . The relationship between this primer and the cold resistance of Azalea should be further explored. In addition, a Cdna library can be constructed for EST-SSR to screen other resistance traits of azalea, such as Rhododendron, such as salt resistance and high-temperature resistance. If the EST-SSR offspring identification is combined with marker-assisted selection technology to screen at the early stage of Azalea development, excellent offspring satisfying both authentic and target characteristics can be selected, thus greatly improving the breeding efficiency of Azalea.
Only a minimum of four primer pairs are required to determine the authenticity of all crossed offspring of the three cross combinations in this experiment. In this experiment, EST-SSR fluorescent molecular markers were used to hybridize and identify all azalea samples tested. Because the traditional silver PAGE staining technique is tedious and time-consuming, especially in the face of large batches and multiple batches of experiments, it is prone to human error 21 , whereas the fluorescent capillary technique can accurately obtain the size of amplified fragment with high sensitivity and reproducibility, it is suitable for the detection of large numbers of hybrid offspring 29 and can greatly improve the detection efficiency and accuracy 20 .
The most conservative characteristics of EST-SSR make it more universal, which helps to reveal the genetic relationships between different genotypes 30,31 , and even has good versatility between different species, genera, and families [32][33][34] . This characteristic used to identify Azalea hybrid offspring are often effective in improving the efficiency of primer use. From the experimental results, it was found that the R.pulcherum'Zihe' R.'Red Apple' combination could be used for offspring identification with at least two primer pairs, R.'Zibo' R.'Kirin' and R.pulcherum'Baihe' R.'Pink Bubble' both cross combinations can be identified with 1 pair of primers for the offspring. In contrast, only 2 pairs of primers were required for the complete identification of all the hybrid offspring in these 3 cross combinations, indicating that the screened primers have good generality among the 3 cross combinations. In order to ensure the reliability of the results, each hybrid offspring must be identified at least twice to determine the true hybrid 35,36 , so this study requires four pairs of primers were to complete the accurate identification of all hybrid offspring in this study. The use of EST-SSR fluorescent molecular markers has also been proved to be a feasible for the identification of Azalea hybrids and has important value inr improving breeding efficiency. However, due to the limited number of samples involved in this experiment, the identification ability to identify primers can only be limited. In the future, we can continue to expand the test on this basis by adding more hybrid combinations and hybrid offspring to verify the generality and discriminatory ability of the primers.
The more distant the parents are, the richer the diversity of the offspring. This study found that compared to the genetic diversity of the parents, the number of alleles and polymorphic information content of the offspring showed different degrees of increase. When comparing the amplified band sizes of the hybrid offspring, it was found that the offspring of all three hybrid combinations showed new bands and deletion of both parents' loci, except for the bands containing the parents, especially the more distantly related R.pulcherum'Zihe' R.'Red Apple' combination, which produced The number of new bands was the highest in the R.pulcherum'Zihe' R.'Red Apple' combination. This situation is also widespread in woody plants such as Litchi chinensis Sonn. 13 and Camellia sinensis 37 , probably because the polymorphism of simple repeat sequences is based on differences in the number of repeats in the amplified region, due to mutations within the primer binding region produced variation in the null allele, while mutations between primer regions may lead to the generation of new alleles. Therefore, it is presumed that these three hybrid combinations formed new genotypes during the hybridization process because of genetic recombination. Therefore, bands of different lengths were produced when the DNA was amplified with the same primers 13 . From the cluster analysis, it is found that the parental genetic distances of three cross combinations are different. The combination of R.pulcherum'Zihe' R.'Red Apple' and R.'Zibo' R.'Kirin' is more affected by parental genetic influence, with more partial paternal genetic offspring, and parental genetic distance close to R.pulcherum'Baihe' R.'Pink Bubble' combination appeared more superparent offspring. This may be due to large genetic differences between parents inhibiting the chromosomal recombination of the offspring when the genetic differences between the parents are too large, resulting in biased segregation. All three cross combinations showed more extensive genetic variation in the offspring with rich genetic diversity, and the rich genotypic variation may cause phenotypic diversity 37  EST-SSR marker. All Rhododendron-related EST sequences were downloaded from the NCBI nucleic acid database (as of November 2019), use CD-HIT-EST online softw a r e (http : //weizhong − lab.ucsd.edu/cdhit s uite/cgi − bin/index.cgi?cmd = cd − hit − est) to delete redundant sequences from the downloaded sequences, use MISA online software(https : //webblast.ipk − gatersleben.de/misa/)MIAS to screen SSR sites, use primer blaster for primer design, use oligo7 screening primers to ensure that there are no hairpin structures, neither dimers nor mismatches 22 . The designed primers were checked by the blast match in the NCBI database to remove invalid primers. In addition to the self-designed primers, 30 EST-SSR Azalea primers were selected from the published literature for evaluation. According to the experimental results of the relationship analysis and progeny identification of azalea, universal primers with clear bands and good polymorphism were screened for primer synthesis and capillary flouresence electrophoresis. Capillary electrophoresis was performed using an ABI3730xl sequencer, and the results were read and analyzed using Gene Marker V 3.0.1 39 . The results of the fluorescence capillary electrophoresis experiments were read using Gene Marker V 3.0.1. By comparing the difference between the flouresence signal of the amplified fragments and the molecular weight standard, the fragment size in a single signal is obtained proportionally and saved in Excel. Data Formater is used to convert the raw data into a data format readable by the analysis software 40 , and Popgen32 and Power Marker software were used to analyze Number of alleles( Na), Number of effective alleles (Ne), Observed heterozygosity (Ho), Expected heterozygosity (He), Nei's genetic diversity index (H), Shannon's information index (I) and Polymorphic information content (PIC) 41 .

Additional information.
All plant materials are cultivated species, not wild or endangered plants. Experiment comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. Experimental research ,including the collection of plant materials, comply with relevant institutional, national, and international guidelines and legislation. All plant materials belong to Shanghai Botanical Garden and are kept in Wanjing Rhododendron Breeding Garden, and the use of these materials was approved by the above two institutions.

Data availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.